Search CORE

1,572 research outputs found

Convolutional RNN: an Enhanced Model for Extracting Features from Sequential Data

Author: Keren Gil
Schuller Björn
Publication venue
Publication date: 01/01/2016
Field of study

Traditional convolutional layers extract features from patches of data by applying a non-linearity on an affine function of the input. We propose a model that enhances this feature extraction process for the case of sequential data, by feeding patches of the data into a recurrent neural network and using the outputs or hidden states of the recurrent units to compute the extracted features. By doing so, we exploit the fact that a window containing a few frames of the sequential data is a sequence itself and this additional structure might encapsulate valuable information. In addition, we allow for more steps of computation in the feature extraction process, which is potentially beneficial as an affine function followed by a non-linearity can result in too simple features. Using our convolutional recurrent layers we obtain an improvement in performance in two audio classification tasks, compared to traditional convolutional layers. Tensorflow code for the convolutional recurrent layers is publicly available in https://github.com/cruvadom/Convolutional-RNN

arXiv.org e-Print Archive

OPUS Augsburg

Crossref

The Many-to-Many Mapping Between the Concordance Correlation Coefficient and the Mean Square Error

Author: Pandit Vedhas
Schuller Björn
Publication venue
Publication date: 03/03/2020
Field of study

We derive the mapping between two of the most pervasive utility functions, the mean square error (

MSE

) and the concordance correlation coefficient (CCC,

\rho_c

). Despite its drawbacks,

MSE

is one of the most popular performance metrics (and a loss function); along with lately

\rho_c

in many of the sequence prediction challenges. Despite the ever-growing simultaneous usage, e.g., inter-rater agreement, assay validation, a mapping between the two metrics is missing, till date. While minimisation of

L_p

norm of the errors or of its positive powers (e.g.,

MSE

) is aimed at

\rho_c

maximisation, we reason the often-witnessed ineffectiveness of this popular loss function with graphical illustrations. The discovered formula uncovers not only the counterintuitive revelation that `

MSE_1<MSE_2

' does not imply `

\rho_{c_1}>\rho_{c_2}

', but also provides the precise range for the

\rho_c

metric for a given

MSE

. We discover the conditions for

\rho_c

optimisation for a given

MSE

; and as a logical next step, for a given set of errors. We generalise and discover the conditions for any given

L_p

norm, for an even p. We present newly discovered, albeit apparent, mathematical paradoxes. The study inspires and anticipates a growing use of

\rho_c

-inspired loss functions e.g.,

\left|\frac{MSE}{\sigma_{XY}}\right|

, replacing the traditional

L_p

-norm loss functions in multivariate regressions.Comment: Why this discovery, or the mapping formulation is important: MSE1CCC2. In other words, MSE minimisation does not necessarily guarantee CCC maximisatio

arXiv.org e-Print Archive

OPUS Augsburg

Scaling Speech Enhancement in Unseen Environments with Noise Embeddings

Author: Han Jing
Keren Gil
Schuller Björn
Publication venue
Publication date: 01/01/2018
Field of study

We address the problem of speech enhancement generalisation to unseen environments by performing two manipulations. First, we embed an additional recording from the environment alone, and use this embedding to alter activations in the main enhancement subnetwork. Second, we scale the number of noise environments present at training time to 16,784 different environments. Experiment results show that both manipulations reduce word error rates of a pretrained speech recognition system and improve enhancement quality according to a number of performance measures. Specifically, our best model reduces the word error rate from 34.04% on noisy speech to 15.46% on the enhanced speech. Enhanced audio samples can be found in https://speechenhancement.page.link/samples

arXiv.org e-Print Archive

OPUS Augsburg

Crossref

Calibrated Prediction Intervals for Neural Network Regressors

Author: Cummins Nicholas
Keren Gil
Schuller Björn
Publication venue
Publication date: 01/01/2018
Field of study

Ongoing developments in neural network models are continually advancing the state of the art in terms of system accuracy. However, the predicted labels should not be regarded as the only core output; also important is a well-calibrated estimate of the prediction uncertainty. Such estimates and their calibration are critical in many practical applications. Despite their obvious aforementioned advantage in relation to accuracy, contemporary neural networks can, generally, be regarded as poorly calibrated and as such do not produce reliable output probability estimates. Further, while post-processing calibration solutions can be found in the relevant literature, these tend to be for systems performing classification. In this regard, we herein present two novel methods for acquiring calibrated predictions intervals for neural network regressors: empirical calibration and temperature scaling. In experiments using different regression tasks from the audio and computer vision domains, we find that both our proposed methods are indeed capable of producing calibrated prediction intervals for neural network regressors with any desired confidence level, a finding that is consistent across all datasets and neural network architectures we experimented with. In addition, we derive an additional practical recommendation for producing more accurate calibrated prediction intervals. We release the source code implementing our proposed methods for computing calibrated predicted intervals. The code for computing calibrated predicted intervals is publicly available

arXiv.org e-Print Archive

OPUS Augsburg

Crossref

Fast Single-Class Classification and the Principle of Logit Separation

Author: Keren Gil
Sabato Sivan
Schuller Björn
Publication venue
Publication date: 17/09/2018
Field of study

We consider neural network training, in applications in which there are many possible classes, but at test-time, the task is a binary classification task of determining whether the given example belongs to a specific class, where the class of interest can be different each time the classifier is applied. For instance, this is the case for real-time image search. We define the Single Logit Classification (SLC) task: training the network so that at test-time, it would be possible to accurately identify whether the example belongs to a given class in a computationally efficient manner, based only on the output logit for this class. We propose a natural principle, the Principle of Logit Separation, as a guideline for choosing and designing losses suitable for the SLC. We show that the cross-entropy loss function is not aligned with the Principle of Logit Separation. In contrast, there are known loss functions, as well as novel batch loss functions that we propose, which are aligned with this principle. In total, we study seven loss functions. Our experiments show that indeed in almost all cases, losses that are aligned with the Principle of Logit Separation obtain at least 20% relative accuracy improvement in the SLC task compared to losses that are not aligned with it, and sometimes considerably more. Furthermore, we show that fast SLC does not cause any drop in binary classification accuracy, compared to standard classification in which all logits are computed, and yields a speedup which grows with the number of classes. For instance, we demonstrate a 10x speedup when the number of classes is 400,000. Tensorflow code for optimizing the new batch losses is publicly available at https://github.com/cruvadom/Logit Separation.Comment: Published as a conference paper in ICDM 201

arXiv.org e-Print Archive

Crossref

Editorial: IEEE Transactions on Affective Computing: Challenges and Chances

Author: Schuller Björn
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

OPUS Augsburg

Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives

Author: Cummins Nicholas
Han Jing
Schuller Björn
Zhang Zixing
Publication venue
Publication date: 21/09/2018
Field of study

Over the past few years, adversarial training has become an extremely active research topic and has been successfully applied to various Artificial Intelligence (AI) domains. As a potentially crucial technique for the development of the next generation of emotional AI systems, we herein provide a comprehensive overview of the application of adversarial training to affective computing and sentiment analysis. Various representative adversarial training algorithms are explained and discussed accordingly, aimed at tackling diverse challenges associated with emotional AI systems. Further, we highlight a range of potential future research directions. We expect that this overview will help facilitate the development of adversarial training for affective computing and sentiment analysis in both the academic and industrial communities

arXiv.org e-Print Archive

OPUS Augsburg